Search results for "approximate string matching"

showing 9 items of 9 documents

Multi-dimensional pattern matching with dimensional wildcards

1995

We introduce a new multi-dimensional pattern matching problem, which is a natural generalization of the on-line search in string matching. We are given a text matrix A[1: n1, ..., 1:n d ] of size N= n1×n2×...×n d , which we may preprocess. Then, we are given, online, an r-dimensional pattern matrix B[1:m1,...,1:m r ] of size M= m1×m2×...×m r , with 1≤r≤d. We would like to know whether B*=B*[*, 1:m1,*, ...,1: mr, *] occurs in A, where * is a dimensional wildcard such that B* is any d-dimensional matrix having size 1 × ... × m1×...1×m r ×...1 and containing the same elements as B. Notice that there might be (d/r)≤2d occurrences of B* for each position of A. We give CRCW-PRAM algorithms for pr…

business.industryGeneralizationCommentz-Walter algorithmPattern recognitionWildcard characterString searching algorithmcomputer.file_formatApproximate string matchingBinary logarithmCombinatoricsMatrix (mathematics)Artificial intelligencePattern matchingbusinesscomputerMathematics
researchProduct

Languages with mismatches

2007

AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…

Combinatorics on wordsApproximate string matchingGeneral Computer ScienceRepetition (rhetorical device)String (computer science)Search engine indexingComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Approximate string matchingData structureTheoretical Computer ScienceCombinatoricsSet (abstract data type)Formal languagesCombinatorics on words Formal languages Approximate string matching IndexingIndexingWord (group theory)MathematicsInteger (computer science)Computer Science(all)Theoretical Computer Science
researchProduct

Indexed Two-Dimensional String Matching

2016

Settore INF/01 - InformaticaTwo-dimensional index data structuresString searching algorithm0102 computer and information sciences02 engineering and technologyApproximate string matching01 natural sciencesCombinatorics010201 computation theory & mathematicsIndex data structures for matrices or imageIndexing for matrices or image0202 electrical engineering electronic engineering information engineeringTwo-dimensional indexing for pattern matching020201 artificial intelligence & image processingString metricMathematics
researchProduct

"Indexing structures for approximate string matching

2003

In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

CombinatoricsCombinatorics on wordsPattern recognition (psychology)Search engine indexingAutomata theoryHamming distanceString searching algorithmApproximate string matchingTime complexityMathematics
researchProduct

From Nerode's congruence to Suffix Automata with mismatches

2009

AbstractIn this paper we focus on the minimal deterministic finite automaton Sk that recognizes the set of suffixes of a word w up to k errors. As first result we give a characterization of the Nerode’s right-invariant congruence that is associated with Sk. This result generalizes the classical characterization described in [A. Blumer, J. Blumer, D. Haussler, A. Ehrenfeucht, M. Chen, J. Seiferas, The smallest automaton recognizing the subwords of a text, Theoretical Computer Science, 40, 1985, 31–55]. As second result we present an algorithm that makes use of Sk to accept in an efficient way the language of all suffixes of w up to k errors in every window of size r of a text, where r is the…

General Computer ScienceOpen problem[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]0102 computer and information sciences02 engineering and technologyString searching algorithm01 natural sciencesTheoretical Computer ScienceCombinatoricsDeterministic automatonSuffix automata0202 electrical engineering electronic engineering information engineeringCombinatorics on words Indexing Suffix Automata Languages with mismatches Approximate string matchingMathematicsDiscrete mathematicsCombinatorics on wordsApproximate string matchingSettore INF/01 - InformaticaLanguages with mismatchesComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)PrefixCombinatorics on wordsDeterministic finite automaton010201 computation theory & mathematicsSuffix automatonIndexing020201 artificial intelligence & image processingSuffixComputer Science::Formal Languages and Automata TheoryComputer Science(all)
researchProduct

A basic analysis toolkit for biological sequences

2007

This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory an…

Theoretical computer sciencelcsh:QH426-470Computer sciencebusiness.industrysoftwareComputationApplied MathematicsString searching algorithmApproximate string matchingSoftware ArticleSet (abstract data type)Longest common subsequence problemlcsh:GeneticsSoftwareComputational Theory and Mathematicslcsh:Biology (General)Structural BiologyAffine transformationPerlbusinesscomputerMolecular Biologylcsh:QH301-705.5computer.programming_language
researchProduct

On the suffix automaton with mismatches

2007

International audience; In this paper we focus on the construction of the minimal deterministic finite automaton S_k that recognizes the set of suffixes of a word w up to k errors. We present an algorithm that makes use of S_k in order to accept in an efficient way the language of all suffixes of w up to k errors in every window of size r, where r is the value of the repetition index of w. Moreover, we give some experimental results on some well-known words, like prefixes of Fibonacci and Thue-Morse words, and we make a conjecture on the size of the suffix automaton with mismatches.

approximate string matchingFibonacci numberlanguages with mismatches[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]Generalized suffix treeBüchi automatonComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)0102 computer and information sciences02 engineering and technology01 natural sciencesCombinatoricsPrefixCombinatorics on wordsDeterministic finite automaton010201 computation theory & mathematics0202 electrical engineering electronic engineering information engineeringSuffix automaton020201 artificial intelligence & image processingsuffix automatacombinatorics on wordsComputer Science::Data Structures and Algorithmscombinatorics on words suffix automata languages with mismatches approximate string matchingWord (computer architecture)Computer Science::Formal Languages and Automata TheoryMathematics
researchProduct

On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching

2010

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least one such match exists. So, for example for the alphabet Σ = {a, b, c}, the string s = abaccbabaaa has Parikh vector p(s) = (6,3,2), and the Parikh vector q = (2,1,1) appears once in s in position (1,4). Like its more precise counterpart, the renown Exact String Matching, Jumbled Pattern Matching has ubiquitous applications, e.g., string matching with a dyslectic word processor, table rearrangements, …

Discrete mathematicsParikh vectors jumbled pattern matching scrabble approximate pattern matching000AnagramParikh vectorsString searching algorithmApproximate string matchingDecision problemalgorithmsData structureJumbled Pattern MatchingSubstringscrabbleapproximate pattern matchingString MatchingWavelet TreePattern matchingMathematics
researchProduct

On Approximate Jumbled Pattern Matching in Strings

2011

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of a Parikh vector q in the text s requires finding a substring t of s with p(t) = q. This can be viewed as the task of finding a jumbled (permuted) version of a query pattern, hence the term Jumbled Pattern Matching. We present several algorithms for the approximate version of the problem: Given a string s and two Parikh vectors u, v (the query bounds), find all maximal occurrences in s of some Parikh vector q such that u <= q <= v. This definition encompasses several natural versions of approximate Parikh vector search. We present an algorithm solving this problem …

Parikh vectors: Average case analysiApproximate searchString algorithmsDiscrete mathematicsWeight functionanalysisSearch engine indexingParikh vectorsAverage case analysisApproximate string matchingSubstringString algorithmTheoretical Computer ScienceCombinatoricsComputational Theory and MathematicsString algorithms Pattern matching Parikh vectors Average case analysis Approximate search Permuted stringsPermuted stringsAverage caseTheory of computationWavelet TreePreprocessorPattern matchingPattern matchingMathematicsTheory of Computing Systems
researchProduct